auxiliary problem
Debiased Machine Learning without Sample-Splitting for Stable Estimators
Estimation and inference on causal parameters is typically reduced to a generalized method of moments problem, which involves auxiliary functions that correspond to solutions to a regression or classification problem. Recent line of work on debiased machine learning shows how one can use generic machine learning estimators for these auxiliary problems, while maintaining asymptotic normality and root-$n$ consistency of the target parameter of interest, while only requiring mean-squared-error guarantees from the auxiliary estimation algorithms. The literature typically requires that these auxiliary problems are fitted on a separate sample or in a cross-fitting manner. We show that when these auxiliary estimation algorithms satisfy natural leave-one-out stability properties, then sample splitting is not required. This allows for sample re-use, which can be beneficial in moderately sized sample regimes. For instance, we show that the stability properties that we propose are satisfied for ensemble bagged estimators, built via sub-sampling without replacement, a popular technique in machine learning practice.
Machine Unlearning of Traffic State Estimation and Prediction
Wang, Xin, Rockafellar, R. Tyrrell, Xuegang, null, Ban, null
Traffic State Estimation and Prediction (TSEP) has been extensively studied to reconstruct traffic state variables (e.g., flow, density, speed, travel time, etc.) using (partial) observed traffic data (Antoniou et al., 2013; Ban et al., 2011; Shi et al., 2021; Li et al., 2020). In recent years, advancements in data collection technologies have enabled TSEP methods to integrate traffic data from diverse sources for more accurate and robust estimation and prediction (Wang et al., 2016; Makridis and Kouvelas, 2023). These data sources can be broadly categorized into infrastructure-collected data and user-contributed data. Infrastructure-collected data typically includes information collected from loop detectors, traffic cameras, and radars installed on roadways or at intersections. In contrast, user-contributed data is derived from individuals, often through vehicles or personal devices, such as GPS traces, vehicle trajectories, and probe data collected via mobile apps or in-vehicle systems.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Transportation > Ground > Road (0.46)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
A Organization of the Appendices
In the Appendix, we give proofs of all results from the main text. We say a function f: R Y! R is M -Lipschitz if for any y 2Y and ˆ y We can also define the Moreau envelope of a function f: R Y! R by The proof of all results in this section can be straightforwardly extended to these settings. Boyd et al. 2004; Bauschke, Combettes, et al. 2011; Rockafellar 1970), but is also useful and Interestingly, there is a similar equivalent characterization for Lipschitz functions as well. Finally, we show that any smooth loss is square-root-Lipschitz. Lipschitz losses is more general than the class of smooth losses studied in Srebro et al. 2010 .
Appendix A Proofs for Section 2
We construct a "ghost" point: x Section 4.5 of [4], we have From Lemma 3.1 and Proposition 3.2 in [48], we have null[ x ] The last relationship we want to show is just equation (13). We separate the discussion into deterministic and stochastic settings. The total complexity is then K T . By Corollary 3.2 and discussion in Section 3.2, Algorithm 1 combined with By Corollary 3.2, Algorithm 1 combined with EG/OGDA can solve such auxiliary We implement these algorithms in the same way as in Section 5. 17 (a) Distance to limit point We compare EG and Catalyst-EG under same stepsizes in Figure 4(a).
- North America > United States > Illinois (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.46)
- Materials > Chemicals > Specialty Chemicals (0.45)
A Organization of the Appendices
In the Appendix, we give proofs of all results from the main text. We say a function f: R Y! R is M -Lipschitz if for any y 2Y and ˆ y We can also define the Moreau envelope of a function f: R Y! R by The proof of all results in this section can be straightforwardly extended to these settings. Boyd et al. 2004; Bauschke, Combettes, et al. 2011; Rockafellar 1970), but is also useful and Interestingly, there is a similar equivalent characterization for Lipschitz functions as well. Finally, we show that any smooth loss is square-root-Lipschitz. Lipschitz losses is more general than the class of smooth losses studied in Srebro et al. 2010 .
- North America > United States > Illinois (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Clust-Splitter $-$ an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets
Lampainen, Jenni, Joki, Kaisa, Karmitsa, Napsu, Mäkelä, Marko M.
Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the minimum sum-of-squares clustering problem in very large datasets. The clustering task is approached through a sequence of three nonsmooth optimization problems: two auxiliary problems used to generate suitable starting points, followed by a main clustering formulation. To solve these problems effectively, the limited memory bundle method is combined with an incremental approach to develop the Clust-Splitter algorithm. We evaluate Clust-Splitter on real-world datasets characterized by both a large number of attributes and a large number of data points and compare its performance with several state-of-the-art large-scale clustering algorithms. Experimental results demonstrate the efficiency of the proposed method for clustering very large datasets, as well as the high quality of its solutions, which are on par with those of the best existing methods.
- Europe > Finland > Southwest Finland > Turku (0.04)
- North America > United States > New York (0.04)
- Europe > Denmark > North Jutland (0.04)
On the Implementation of a Bayesian Optimization Framework for Interconnected Systems
González, Leonardo D., Zavala, Victor M.
Bayesian optimization (BO) is an effective paradigm for the optimization of expensive-to-sample systems. Standard BO learns the performance of a system $f(x)$ by using a Gaussian Process (GP) model; this treats the system as a black-box and limits its ability to exploit available structural knowledge (e.g., physics and sparse interconnections in a complex system). Grey-box modeling, wherein the performance function is treated as a composition of known and unknown intermediate functions $f(x, y(x))$ (where $y(x)$ is a GP model) offers a solution to this limitation; however, generating an analytical probability density for $f$ from the Gaussian density of $y(x)$ is often an intractable problem (e.g., when $f$ is nonlinear). Previous work has handled this issue by using sampling techniques or by solving an auxiliary problem over an augmented space where the values of $y(x)$ are constrained by confidence intervals derived from the GP models; such solutions are computationally intensive. In this work, we provide a detailed implementation of a recently proposed grey-box BO paradigm, BOIS, that uses adaptive linearizations of $f$ to obtain analytical expressions for the statistical moments of the composite function. We show that the BOIS approach enables the exploitation of structural knowledge, such as that arising in interconnected systems as well as systems that embed multiple GP models and combinations of physics and GP models. We benchmark the effectiveness of BOIS against standard BO and existing grey-box BO algorithms using a pair of case studies focused on chemical process optimization and design. Our results indicate that BOIS performs as well as or better than existing grey-box methods, while also being less computationally intensive.
- Transportation > Air (0.49)
- Energy > Oil & Gas (0.46)